BaseNP Supersense Tagging for Japanese Texts

نویسندگان

  • Hirotoshi Taira
  • Sen Yoshida
  • Masaaki Nagata
چکیده

This paper describes baseNP supersense tagging for Japanese texts. The task extracts base noun phrases (baseNPs) from raw texts in Japanese, and labels their baseNPs with supersenses. This task has a number of applications including predicate argument structure analysis and question answering. While the definition of baseNP in English is relatively clear, its definition in Japanese has not yet been clear. In this paper, we defined Japanese baseNP analogous to English and defined Japanese supersenses using a broad-coverage Japanese thesaurus, Nihongo Goi Taikei (comprehensive outline of Japanese vocabulary). We then adopted a sequential tagging algorithm for the task, namely the averaged perceptron with HMM, and achieve high performance compared to a baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coarse Semantic Classification of Rare Nouns Using Cross-Lingual Data and Recurrent Neural Networks

The paper presents a method for WordNet supersense tagging of Sanskrit, an ancient Indian language with a corpus grown over four millenia. The proposed method merges lexical information from Sanskrit texts with lexicographic definitions from Sanskrit-English dictionaries, and compares the performance of two machine learning methods for this task. Evaluation concentrates on Vedic, the oldest lay...

متن کامل

UFRGS&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging

This paper presents our approach towards the SemEval-2016 Task 10 – Detecting Minimal Semantic Units and their Meanings. Systems are expected to provide a representation of lexical semantics by (1) segmenting tokens into words and multiword units and (2) providing a supersense tag for segments that function as nouns or verbs. Our pipeline rule-based system uses no external resources and was imp...

متن کامل

A Unified Statistical Model for the Identification of English BaseNP

This paper presents a novel statistical model for automatic identification of English baseNP. It uses two steps: the Nbest Part-Of-Speech (POS) tagging and baseNP identification given the N-best POS-sequences. Unlike the other approaches where the two steps are separated, we integrate them into a unified statistical framework. Our model also integrates lexical information. Finally, Viterbi algo...

متن کامل

Description and Results of the SuperSense Tagging Task

SuperSense tagging (SST) is a Natural Language Processing task that consists in annotating each significant entity in a text, like nouns, verbs, adjectives and adverbs, within a general semantic taxonomy defined by the WordNet lexicographer classes (called SuperSenses). SST can be considered as a task half-way between Named-Entity Recognition (NER) and Word Sense Disambiguation (WSD): it is an ...

متن کامل

A Resource and Tool for Super-sense Tagging of Italian Texts

A SuperSense Tagger is a tool for the automatic analysis of texts that associates to each noun, verb, adjective and adverb a semantic category within a general taxonomy. The developed tagger, based on a statistical model (Maximum Entropy), required the creation of an Italian annotated corpus, to be used as a training set, and the improvement of various existing tools. The obtained results signi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009